3 research outputs found
Understanding the Helpfulness of Stale Bot for Pull-based Development: An Empirical Study of 20 Large Open-Source Projects
Pull Requests (PRs) that are neither progressed nor resolved clutter the list
of PRs, making it difficult for the maintainers to manage and prioritize
unresolved PRs. To automatically track, follow up, and close such inactive PRs,
Stale bot was introduced by GitHub. Despite its increasing adoption, there are
ongoing debates on whether using Stale bot alleviates or exacerbates the
problem of inactive PRs. To better understand if and how Stale bot helps
projects in their pull-based development workflow, we perform an empirical
study of 20 large and popular open-source projects. We find that Stale bot can
help deal with a backlog of unresolved PRs as the projects closed more PRs
within the first few months of adoption. Moreover, Stale bot can help improve
the efficiency of the PR review process as the projects reviewed PRs that ended
up merged and resolved PRs that ended up closed faster after the adoption.
However, Stale bot can also negatively affect the contributors as the projects
experienced a considerable decrease in their number of active contributors
after the adoption. Therefore, relying solely on Stale bot to deal with
inactive PRs may lead to decreased community engagement and an increased
probability of contributor abandonment.Comment: Manuscript submitted to ACM Transactions on Software Engineering and
Methodolog
Predicting the First Response Latency of Maintainers and Contributors in Pull Requests
The success of a Pull Request (PR) depends on the responsiveness of the
maintainers and the contributor during the review process. Being aware of the
expected waiting times can lead to better interactions and managed expectations
for both the maintainers and the contributor. In this paper, we propose a
machine-learning approach to predict the first response latency of the
maintainers following the submission of a PR, and the first response latency of
the contributor after receiving the first response from the maintainers. We
curate a dataset of 20 large and popular open-source projects on GitHub and
extract 21 features to characterize projects, contributors, PRs, and review
processes. Using these features, we then evaluate seven types of classifiers to
identify the best-performing models. We also perform permutation feature
importance and SHAP analyses to understand the importance and impact of
different features on the predicted response latencies. Our best-performing
models achieve an average improvement of 33% in AUC-ROC and 58% in AUC-PR for
maintainers, as well as 42% in AUC-ROC and 95% in AUC-PR for contributors
compared to a no-skilled classifier across the projects. Our findings indicate
that PRs submitted earlier in the week, containing an average or slightly
above-average number of commits, and with concise descriptions are more likely
to receive faster first responses from the maintainers. Similarly, PRs with a
lower first response latency from maintainers, that received the first response
of maintainers earlier in the week, and containing an average or slightly
above-average number of commits tend to receive faster first responses from the
contributors. Additionally, contributors with a higher acceptance rate and a
history of timely responses in the project are likely to both obtain and
provide faster first responses.Comment: Manuscript submitted to IEEE Transactions on Software Engineering
(TSE
On Wasted Contributions: Understanding the Dynamics of Contributor-Abandoned Pull Requests
Pull-based development has enabled numerous volunteers to contribute to
open-source projects with fewer barriers. Nevertheless, a considerable amount
of pull requests (PRs) with valid contributions are abandoned by their
contributors, wasting the effort and time put in by both the contributors and
maintainers. To better understand the underlying dynamics of
contributor-abandoned PRs, we conduct a mixed-methods study using both
quantitative and qualitative methods. We curate a dataset consisting of 265,325
PRs including 4,450 abandoned ones from ten popular and mature GitHub projects
and measure 16 features characterizing PRs, contributors, review processes, and
projects. Using statistical and machine learning techniques, we find that
complex PRs, novice contributors, and lengthy reviews have a higher probability
of abandonment and the rate of PR abandonment fluctuates alongside the
projects' maturity or workload. To identify why contributors abandon their PRs,
we also manually examine a random sample of 354 abandoned PRs. We observe that
the most frequent abandonment reasons are related to the obstacles faced by
contributors, followed by the hurdles imposed by maintainers during the review
process. Finally, we survey the top core maintainers of the studied projects to
understand their perspectives on dealing with PR abandonment and on our
findings.Comment: Manuscript accepted for publication in ACM Transactions on Software
Engineering and Methodology (TOSEM